{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RNN (循环神经网络)\n",
    "\n",
    "以前我们写过的 Network 的结果都是反馈给下一层。\n",
    "\n",
    "那在RNN中，我们不仅要输出结果，还要输出一个**隐含状态**，给当前层在处理下一个样本时使用。\n",
    "\n",
    "结构图如下所示：\n",
    "\n",
    "![Recurrent Neural Networks have loops](../images/RNN/RNN-rolled.jpg)\n",
    "\n",
    "\n",
    "可以看到，跟普通的 Neural Network 相比，我们这里仅仅是多了一个**隐含状态**，那么这个多出来的**隐含状态**有什么好处呢？\n",
    "\n",
    "这个**隐含状态**，我们可以看作是上一个样本遗留给我们的信息，通过这个信息，我们可以把握过去样本的特征，结合当前的样本，组成一种类似于 “时间序列” 样本信息，从而让模型有了处理前后有依赖关系数据样本的能力。\n",
    "\n",
    "**举个例子:**\n",
    "\n",
    "语言模型的任务是给定句子的前 $t$ 个字符，然后预测第 $t+1$ 个字符。假设我们的句子是“你好世界”，使用循环神经网络来预测的一个做法是，在时间 $1$ 输入“你”，预测“好”同时生成一个**隐含状态**；\n",
    "\n",
    "在时间 $2$ 向同一个网络输入刚才生成的**隐含状态** 和 “好” 去预测“世”。同时输出一个更新过的隐含状态。\n",
    "\n",
    "对于这个**隐含状态**， 我们会希望前面的信息能够保存在这个隐含状态里，从而提升预测效果。下图展示了这个过程。\n",
    "\n",
    "![Recurrent Neural Networks have loops](../images/RNN/RNN.jpg)\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "# RNN的公式表示\n",
    "\n",
    "针对输入输出加上时间状态 $t$。在 $t$ 时刻，我们的输入为 $X_t$ 和 $H_{t-1}$，经过一个隐含层之后，我们有输出 $H_t$，经过最后的输出层得到 $\\hat{Y_t}$。\n",
    "\n",
    "其中 $X_t \\in R^{n \\times x}$ (样本数量为 $n$，每个样本特征向量维度是 $x$) ； $H_{t-1} \\in R^{n \\times h}$ （其中隐含层长度为 $h$）\n",
    "\n",
    "隐含层的计算公式为：\n",
    "\n",
    "$$\\mathbf{H}_t = \\phi(\\mathbf{X}_t \\mathbf{W}_{xh} + \\mathbf{H}_{t-1} \\mathbf{W}_{hh}  + \\mathbf{b}_h)$$\n",
    "\n",
    "其中 $\\mathbf{W}_{xh}$ 和 $\\mathbf{W}_{hh}$ 是隐含层的权重， $\\mathbf{b}_h$ 是隐含层的 bias。\n",
    "\n",
    "输出层的计算公式为：\n",
    "\n",
    "$$\\hat{\\mathbf{Y}}_t = \\text{softmax}(\\mathbf{H}_t \\mathbf{W}_{hy}  + \\mathbf{b}_y)$$\n",
    "\n",
    "其中 $\\mathbf{W}_{hy}$ 是输出层权重，$\\mathbf{b}_y$ 是输出层的 bias\n",
    "\n",
    "\n",
    "最开始的隐含状态里的元素通常会被初始化为0，也就是 $H_0$ 是个零矩阵。\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "循环神经网络的简单介绍就到这里了，如果想要更加细致的了解，可以看本人写过的一篇Blog [详细剖析RNN理论、变种和应用](https://lianhaimiao.github.io/2018/02/09/%E8%AF%A6%E7%BB%86%E5%89%96%E6%9E%90RNN%E7%90%86%E8%AE%BA%E3%80%81%E5%8F%98%E7%A7%8D%E5%92%8C%E5%BA%94%E7%94%A8/#more)\n",
    "\n",
    "# RNN的 PyTorch 实现\n",
    "\n",
    "理论部分我们已经了解了，那么下实际操作中，我们如何用 PyTorch 写一个RNN呢？\n",
    "\n",
    "\n",
    "答案很简单，就是一个函数...\n",
    "\n",
    "```\n",
    "torch.nn.RNN()\n",
    "```\n",
    "\n",
    "对的，你没看错，实现 RNN 就是这么简单。\n",
    "\n",
    "在 pyTroch 中你实现 RNN 模型的时候，可以传入的参数有：\n",
    "\n",
    "> input_size – The number of expected features in the input x\n",
    "\n",
    "> hidden_size – The number of features in the hidden state h\n",
    "\n",
    "> num_layers – Number of recurrent layers.\n",
    "\n",
    "> nonlinearity – The non-linearity to use [‘tanh’|’relu’]. Default: ‘tanh’\n",
    "\n",
    "> bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True\n",
    "\n",
    "> batch_first – If True, then the input and output tensors are provided as (batch, seq, feature)\n",
    "\n",
    "> dropout – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer\n",
    "\n",
    "> bidirectional – If True, becomes a bidirectional RNN. Default: False\n",
    "\n",
    "\n",
    "在这里我们需要重点关注的是 **input_size**、**hidden_size**、**batch_first**\n",
    "\n",
    "这三个分别表示，输入数据的特征大小、隐含层特征大小、batch放在第一位。\n",
    "\n",
    "模型实现之后，就是输入数据的格式了。如果在模型实现的时候选择了**batch_first=True**，那么在输入数据的格式就会是 (batch, seq_len, input_size)\n",
    "\n",
    "\n",
    "OK，整个RNN模型的 PyTorch 的理论部分，先介绍到这里，接下来的练习是**使用 RNN 模型来生成恐龙名字**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# 使用 RNN 模型来生成恐龙名字\n",
    "\n",
    "**PS：该练习的原始数据和想法来自于 吴恩达 Deep Learning 系列课程第五课**\n",
    "\n",
    "任务描述：\n",
    "\n",
    "现在，我们有一堆恐龙的名字，我们希望通过训练一个 RNN 模型，在我们给定一个字符的前提下，模型可以自动为我们生成一个酷炫的恐龙名字。\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "为了实现这个任务，我们一共分六步走：\n",
    "\n",
    "第一步：字符的数值表示 和 One-Hot Encoding\n",
    "\n",
    "第二步：构造训练集\n",
    "\n",
    "第三步：构造模型\n",
    "\n",
    "第四步：构建更新梯度和梯度剪裁的函数\n",
    "\n",
    "第五步：随机采样\n",
    "\n",
    "第六步：准备工作完成，开始训练\n",
    "\n",
    "\n",
    "其中，第四步，如果不愿意构建梯度裁剪函数，可以直接使用 torch.nn.optim 中提供的优化器来对模型进行更新。\n",
    "\n",
    "最后，在PyTorch中损失函数的定义一定要慎重！ 如果**损失函数是 NLLLoss(negative log likelihood loss)**模型的最后一层就需要使用 softmax，但是如果损失函数是CrossEntropyLoss，那么pytorch会自动给你加上Softmax()\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第一步：字符的数值表示 和 One-Hot Encoding\n",
    "\n",
    "因为字符没办法输入到模型里面，所以我们得先把数据里面所有不同的字符拿出来做成一个字典，然后可以把每个字符转成从0开始的索引(index)来方便之后的使用。\n",
    "\n",
    "需要注意的是我们一共有 27 个字符，出了小写字母 a-z(26个) 之外，我们还会有一个字符就是 \"\\n\" 这个字符在这里的含义是 < EOS \\> (or \"End of sentence\") \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "from torch.nn import functional as F\n",
    "from torch.autograd import Variable\n",
    "import numpy as np\n",
    "import random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 构建 config 类来 控制所有参数\n",
    "class Config(object):\n",
    "    def __init__(self):\n",
    "        self.path = '../dataSet/dinos.txt'\n",
    "        self.data_size = None\n",
    "        self.vocab_size = None\n",
    "        self.char_to_ix = None\n",
    "        self.ix_to_char = None\n",
    "        self.output_size = None\n",
    "        self.epoch = 5000\n",
    "        self.lr = 0.015\n",
    "        self.hidden_size = 50\n",
    "        self.batch_size = 1\n",
    "        self.maxValue = 10 # 防止梯度爆炸所使用\n",
    "    \n",
    "config = Config()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "数据集中一共有 19909 个字符，不同的字符一共有 27 个。\n"
     ]
    }
   ],
   "source": [
    "# 读取数据\n",
    "data = open(config.path, 'r').read()\n",
    "# 字符全部小写\n",
    "data = data.lower()\n",
    "# 查看有多少种字符\n",
    "chars = list(set(data))\n",
    "# 27个字符，所以输出也是 27\n",
    "config.data_size, config.vocab_size, config.output_size = len(data), len(chars), len(chars)\n",
    "\n",
    "print('数据集中一共有 %d 个字符，不同的字符一共有 %d 个。' % (config.data_size, config.vocab_size))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{0: '\\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}\n"
     ]
    }
   ],
   "source": [
    "config.char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) } # 字符到索引的映射\n",
    "config.ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) } # 索引到字符的映射\n",
    "print(config.ix_to_char)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，我们已经得到了 字符转数字 和 数字转字符 的映射了，但是，这个现在的每个字符依旧是用一个整数表示的，而输入进网络我们需要一个定长的向量。一个常用的办法是使用 **One-Hot Encoding** 来将其表示成向量。也就是说，如果一个字符的整数值是 $i$ , 那么我们创建一个全0的长为 vocab_size 的向量，并将其第 $i$ 位设成1。该向量就是对原字符的 **one-hot 向量**。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def one_hot(ids, vocab_size):\n",
    "    \"\"\"\n",
    "        ids: list\n",
    "    \"\"\"\n",
    "    ids = torch.LongTensor(ids).view(-1, 1)\n",
    "    out = torch.zeros(ids.size()[0], vocab_size).scatter_(1, ids, 1)\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第二步：构造训练集\n",
    "\n",
    "> 由于训练数据集中的数据长度不一致，所以，我们每次只训练一个样本。 当然有别的办法可以解决，但是在这个例子中，我们决定采用每次训练一个数据来训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def train_dataset(path, char_to_ix, vocab_size):\n",
    "    with open(path, 'r') as f:\n",
    "        examples = f.readlines()\n",
    "    examples = [x.lower().strip() for x in examples]\n",
    "    for index in range(len(examples)):\n",
    "        X = [char_to_ix[ch] for ch in examples[index]] \n",
    "        Y = X[1:] + [char_to_ix[\"\\n\"]]\n",
    "        i_d = one_hot(X, vocab_size)\n",
    "        input_data = Variable(i_d.view(1, i_d.size()[0], i_d.size()[1])) # batch*seq_len*input_size\n",
    "        target_data = Variable(torch.LongTensor(Y))\n",
    "        yield input_data, target_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第三步：构造模型\n",
    "\n",
    "我们使用最原始的RNN模型，构造的具体形式，如下图所示：\n",
    "\n",
    "![Dinasor RNN Model](../images/RNN/rnn_model.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 构造模型\n",
    "class DinosaurModel(nn.Module):\n",
    "    def __init__(self, input_size, hidden_size, output_size, batch_size, num_layers=1):\n",
    "        super(DinosaurModel, self).__init__()\n",
    "        self.hidden_size = hidden_size\n",
    "        self.output_size = output_size\n",
    "        self.input_size = input_size\n",
    "        self.batch_size = batch_size\n",
    "        self.num_layers = num_layers\n",
    "        # 就是一行就构建了 RNN 模型\n",
    "        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)\n",
    "        \n",
    "        self.fc  = nn.Linear(hidden_size, output_size)\n",
    "    \n",
    "    def forward(self, x, h0):\n",
    "        # Propagate input through RNN\n",
    "        # Input: (batch, seq_len, input_size)\n",
    "        # hidden: (batch, num_layers * num_directions, hidden_size)\n",
    "        # out: (batch, seq_len, hidden_size)\n",
    "        out, hidden = self.rnn(x, h0)\n",
    "        \n",
    "        # 把 rnn 的结果堆成 (batch*seq_len, hidden_size) 大小\n",
    "        out = out.view(out.size()[0]*out.size()[1], self.hidden_size)\n",
    "        y = self.fc(out)\n",
    "        return y\n",
    "    \n",
    "    def init_hidden(self):\n",
    "        # 初始化隐含层\n",
    "        return Variable(torch.zeros(self.batch_size, self.num_layers, self.hidden_size))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第四步：构建更新梯度和梯度剪裁的函数\n",
    "\n",
    "在这里，我们不选择使用优化器，为了能够实现梯度裁剪，我们使用最原始的梯度下降法，同时手动实现 更新梯度的过程。\n",
    "\n",
    "我们在训练循环神经网络的时候，当每个时序训练样本的时序长度 $T_x$ 较大或者时刻 $t$ 较小，那么目标函数有关 $t$ 时刻的隐含层变量梯度容易出现梯度消失(valishing) 或 梯度爆炸(explosion)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def clip_and_update(parameters, lr, maxValue):\n",
    "    for p in parameters:\n",
    "        gradients = torch.clamp(p.grad.data, min=-maxValue, max=maxValue)\n",
    "        p.data.add_(-lr, gradients)\n",
    "    return"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第五步：随机采样\n",
    "\n",
    "现在，假设我们的模型已经训练好了，我们需要生成新的文本（恐龙名字），生成的过程如下图所示：\n",
    "\n",
    "![Sampling](../images/RNN/sampling.png)\n",
    "\n",
    "\n",
    "我们把 $x^{\\langle 1 \\rangle} = \\vec{0}$ 当作第一步的输入，然后让该网络每一步生成一个字符，一直到我们生成了结尾符 < EOS > 在这里的结尾符就是 \"\\n\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sample(model, char_to_ix, ix_to_char, vocab_size):\n",
    "    # 初始值\n",
    "    random_int = random.randint(0, vocab_size-1)\n",
    "    i_d = one_hot([random_int], vocab_size)\n",
    "    a_input = Variable(i_d).view(1, 1, vocab_size)\n",
    "    indices = []\n",
    "    idx = -1\n",
    "    counter = 0\n",
    "    eos = char_to_ix['\\n']\n",
    "    \n",
    "    while (idx != eos and counter != 15):\n",
    "        h0 = model.init_hidden()\n",
    "        out = model(a_input, h0)\n",
    "        # 通过 softmax 求出每个字符的概率\n",
    "        p = F.softmax(out)\n",
    "        # 取出概率最大的字符的位置\n",
    "        val, ids = torch.max(p, 1)\n",
    "        # 加入预测结果数组里面\n",
    "        idx = ids.data[0]\n",
    "        indices.append(idx)\n",
    "        a_input = Variable(one_hot([idx], vocab_size).view(1, 1, vocab_size))\n",
    "        counter += 1\n",
    "    if (counter == 30):\n",
    "        indices.append(char_to_ix['\\n'])\n",
    "    strl = [ix_to_char[i] for i in indices]\n",
    "    return \"\".join(strl)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第六步：准备工作完成，开始训练\n",
    "\n",
    "PS: 由于主要是讲述RNN，所以并没有特别认真的去训练，又兴趣的朋友可以试试 加上GPU， 同时多加训练次数，来尝试获取更好的结果\n",
    "\n",
    "训练效果显示，2000次训练并没有使得我们的模型真的学到如何取一个比较牛逼的恐龙名字，而且它不懂得进行结尾，易出现重复字母。\n",
    "\n",
    "那么我们该如何解决这样的问题呢？\n",
    "\n",
    "个人认为可以从以下几个方向考虑\n",
    "\n",
    "1、改变优化算法，从简单的梯度下降，换成 RMSProp 或者 Adam \n",
    "\n",
    "2、尝试修改 learning rate 但是感觉效果不一定好。\n",
    "\n",
    "3、增加训练次数。\n",
    "\n",
    "4、修改模型，使用 LSTM 来解决问题。\n",
    "\n",
    "5、增加Padding，让我们可以一次性训练一批数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loss is 1.242449\n",
      "ranoranoranoran\n",
      "loss is 1.237935\n",
      "inoranoranorano\n",
      "loss is 1.270121\n",
      "inorunorunoruno\n",
      "loss is 1.273657\n",
      "uneuneuneuneune\n",
      "loss is 1.335257\n",
      "oneuneuneuneune\n",
      "loss is 1.309117\n",
      "alineoralineora\n",
      "loss is 1.272440\n",
      "uneuneuneuneune\n",
      "loss is 1.307746\n",
      "uneuneuneuneune\n",
      "loss is 1.261119\n",
      "ineuneuneuneune\n",
      "loss is 1.256379\n",
      "uteuteuteuteute\n"
     ]
    }
   ],
   "source": [
    "dinos = DinosaurModel(config.vocab_size, config.hidden_size, config.output_size, config.batch_size)\n",
    "\n",
    "# 有的朋友可能很好奇，明明就是softmax + Negtive log-likelihood function 为啥要用 CrossEntropyLoss 呢？\n",
    "# 因为 PyTorch 就是这样设计的啊~\n",
    "loss_fn = nn.CrossEntropyLoss()\n",
    "\n",
    "for i in range(config.epoch):    \n",
    "    # 数据准备\n",
    "    total_loss = []\n",
    "    for input_data, target_data in train_dataset(config.path, config.char_to_ix, config.vocab_size):\n",
    "        dinos.zero_grad()\n",
    "        # 隐含层初始化\n",
    "        h0 = dinos.init_hidden()\n",
    "        # 数据扔进模型里面\n",
    "        y_pred = dinos(input_data, h0)\n",
    "        # 求 loss\n",
    "        loss = loss_fn(y_pred, target_data)\n",
    "        total_loss.append(loss.data[0])\n",
    "        # 反向传播\n",
    "        loss.backward()\n",
    "        # 梯度剪裁同时更新模型\n",
    "        clip_and_update(dinos.parameters(), config.lr, config.maxValue)\n",
    "\n",
    "    if (i+1) % 500 == 0:\n",
    "        print(\"loss is %.6f\" %(np.mean(total_loss)))\n",
    "        # 采样\n",
    "        strl = sample(dinos, config.char_to_ix, config.ix_to_char, config.vocab_size)\n",
    "        print(strl)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
